42 research outputs found

    A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance

    Full text link
    Compositional Data Analysis (CoDa) has gained popularity in recent years. This type of data consists of values from disjoint categories that sum up to a constant. Both Dirichlet regression and logistic-normal regression have become popular as CoDa analysis methods. However, fitting this kind of multivariate models presents challenges, especially when structured random effects are included in the model, such as temporal or spatial effects. To overcome these challenges, we propose the logistic-normal Dirichlet Model (LNDM). We seamlessly incorporate this approach into the R-INLA package, facilitating model fitting and model prediction within the framework of Latent Gaussian Models (LGMs). Moreover, we explore metrics like Deviance Information Criteria (DIC), Watanabe Akaike information criterion (WAIC), and cross-validation measure conditional predictive ordinate (CPO) for model selection in R-INLA for CoDa. Illustrating LNDM through a simple simulated example and with an ecological case study on Arabidopsis thaliana in the Iberian Peninsula, we underscore its potential as an effective tool for managing CoDa and large CoDa databases

    Recent statistical advances and applications of species distribution modeling

    Get PDF
    En el mundo en que vivimos, producimos aproximadamente 2.5 quintillones de bytes de datos por día. Esta enorme cantidad de datos proviene de las redes sociales, Internet, satélites, etc. Todos estos datos, que se pueden registrar en el tiempo o en el espacio, son información que puede ayudarnos a comprender la propagación de una enfermedad, el movimiento de especies o el cambio climático. El uso de modelos estadísticos complejos ha aumentado recientemente en el contexto del estudio de la distribución de especies. Esta complejidad ha hecho que los procesos inferenciales y predictivos sean difíciles de realizar. El enfoque bayesiano se ha convertido en una buena opción para lidiar con estos modelos, debido a la facilidad con la que se puede incorporar la información previa, junto con el hecho de que proporciona una estimación de la incertidumbre más realista y precisa. En esta tesis, mostramos una visión actualizada del uso de las últimas herramientas estadísticas que han surgido en la aplicación de modelos de distribución de especies (SDMs) en contextos reales desde una perspectiva bayesiana, y desarrollamos nuevas herramientas metodológicas para resolver algunos problemas estadísticos que aparecieron en ese proceso. Con respecto a la aplicación de las últimas herramientas estadísticas en el contexto de los SDMs, los objetivos específicos han sido modelizar la producción de ascosporas Plurivorosphaerella nawae en la hojarasca de caqui; estudiar los factores espaciales y climáticos asociados con la distribución de la mancha negra de los cítricos causada por el hongo Phyllosticta citricarpa; analizar los efectos de la estructura genética y la autocorrelación espacial en los cambios de rango de distribución de las especies; y estudiar la distribución del delfín mular (Tursiops truncatus). Dos objetivos han marcado la parte más metodológica de la tesis: una revisión centrada en los problemas estadísticos en SDMs y la implementación de la regresión de Dirichlet bayesiana en el contexto de la aproximación de Laplace anidada integrada (INLA). La tesis que aquí presentamos es un compendio de ocho artículos y a continuación mostramos su estructura. En los cuatro primeros capítulos presentamos una introducción general que incluye una descripción de los objetivos (Capítulo 1), la base de la metodología empleada (Capítulos 2 y 3) y una descripción de los resultados obtenidos (Capítulo 4). En los ocho capítulos siguientes, mostramos todos los artículos que componen este compendio. Y por último, incluimos el Capítulo 13, donde se presentan algunas conclusiones y líneas futuras de investigación, seguido de una bibliografía genérica correspondiente a los capítulos introductorios.In the world that we live, we produce approximately 2.5 quintillion bytes of data per day. This huge amount of data comes from social media, internet, satellites, etc. All these data, which can be recorded in time or in space, are information that can help us to understand the spread of a disease, the movement of species or the climate change. The use of complex statistical models has recently increased in the context of species distribution behavior. This complexity has made the inferential and predictive processes challenging to perform. The Bayesian approach has become a good option to deal with these models due to the ease with which prior information can be incorporated along with the fact that it provides a more realistic and accurate estimation of uncertainty. This Thesis is devoted to provide an updated vision of the use of the latest statistical tools that have been emerging in the application of species distribution models (SDMs) in real contexts from a Bayesian perspective, and to develop new methodological tools to solve some statistical problems appeared in that process. With regard to the application of the latest statistical tools in the context of SDMs, the particular objectives have been to model the production of Plurivorosphaerella nawae ascospores in persimmon leaf litter; to study the spatial and climatic factors associated with the distribution of the citrus black spot disease caused by Phyllosticta citricarpa; to analyze the effects of geographic genetic structure and spatial autocorrelation on species distribution range shifts; and to study the bottlenose dolphin Tursiops truncatus) distribution. Two goals have guided the most methodological part of the Thesis: a review with the focus in the statistical issues in Species Distribution modeling, and the implementation of Bayesian Dirichlet regression in the context of the integrated nested Laplace approximation (INLA). These two main objectives provide the following structure to the Thesis, which is a compendium of eight papers. The first four chapters are devoted to present a general introduction including a description of the objectives (Chapter 1), the basis of the methodology employed (Chapters 2 and 3) and a description of the results obtained (Chapter 4). The next eigth chapters are dedicated to display all the papers which compose this compendium. In particular, in Chapter 5, we present a paper where a hierarchical Bayesian beta regression is constructed to fit the dynamics of Plurivorosphaerella nawae ascospore production in the leaf litter. Chapters 6, 7 and 8 use geostatistical tools and hierarchical Bayesian logistic regression models to study the spatial and climatic factors associated with the distribution of the citrus black spot disease. In Chapter 9, we develop spatial hierarchical Bayesian beta regression models to analyze the effects of geographic genetic structure and spatial autocorrelation on species distribution range shifts. In Chapter 10, a non-stationary hierarchical Bayesian logistic model is employed to study the bottlenose dolphin (Tursiops truncatus) distribution. Chapters 11 and 12 are devoted to cover the most methodological part of this Thesis. We present a review with the focus in the statistical issues in Species Distribution modeling (Chapter 11), and a way to implement the Bayesian Dirichlet regression in the context of the integrated nested Laplace approximation (Chapter 12). The final part of the Thesis includes Chapter 13, where some conclusions and future lines of research are presented, and a generic bibliography corresponding to the introductory chapters

    A Decision Support System Based on Degree-Days to Initiate Fungicide Spray Programs for Peach Powdery Mildew in Catalonia, Spain

    Get PDF
    The incidence of peach powdery mildew (PPM) on fruit was monitored in commercial peach orchards to i) describe the disease progress in relation to several environmental parameters and ii) establish an operating threshold to initiate a fungicide spray program based on accumulated degree-day (ADD) data. A beta-regression model for disease incidence showed a substantial contribution of the random effects orchard and year, whereas relevant fixed effects corresponded to ADD, wetness duration, and ADD considering vapor pressure deficit and rain. When beta-regression models were fitted for each orchard and year considering only ADD, disease onset was observed at 242 ± 13 ADD and symptoms did not develop further after 484 ± 42 ADD. An operating threshold to initiate fungicide applications was established at 220 ADD, coinciding with a PPM incidence in fruit around 0.05. A validation was further conducted by comparing PPM incidence in i) a standard, calendar-based program, ii) a program with applications initiated at 220 ADD, and iii) a nontreated control. A statistically relevant reduction in disease incidence in fruit was obtained with both fungicide programs, from 0.244 recorded in the control to 0.073 with the 220-ADD alert program, and 0.049 with the standard program. The 220-ADD alert program resulted in 33% reduction in fungicide applications.info:eu-repo/semantics/acceptedVersio

    The Integrated nested Laplace approximation for fitting models with multivariate response

    Get PDF
    This paper introduces a Laplace approximation to Bayesian inference in regression models for multivariate response variables. We focus on Dirichlet regression models, which can be used to analyze a set of variables on a simplex exhibiting skewness and heteroscedasticity, without having to transform the data. These data, which mainly consist of proportions or percentages of disjoint categories, are widely known as compositional data and are common in areas such as ecology, geology, and psychology. We provide both the theoretical foundations and a description of how this Laplace approximation can be implemented in the case of Dirichlet regression. The paper also introduces the package dirinla in the R-language that extends the INLA package, which can not deal directly with multivariate likelihoods like the Dirichlet likelihood. Simulation studies are presented to validate the good behaviour of the proposed method, while a real data case-study is used to show how this approach can be applied

    A hierarchical Bayesian Beta regression approach to study the effects of geographical genetic structure and spatial autocorrelation on species distribution range shifts

    Get PDF
    Global climate change (GCC) may be causing distribution range shifts in many organisms worldwide. Multiple efforts are currently focused on the development of models to better predict distribution range shifts due to GCC. We addressed this issue by including intraspecific genetic structure and spatial autocorrelation (SAC) of data in distribution range models. Both factors reflect the joint effect of ecoevolutionary processes on the geographical heterogeneity of populations. We used a collection of 301 georeferenced accessions of the annual plant Arabidopsis thaliana in its Iberian Peninsula range, where the species shows strong geographical genetic structure. We developed spatial and nonspatial hierarchical Bayesian models (HBMs) to depict current and future distribution ranges for the four genetic clusters detected. We also compared the performance of HBMs with Maxent (a presence-only model). Maxent and nonspatial HBMs presented some shortcomings, such as the loss of accessions with high genetic admixture in the case of Maxent and the presence of residual SAC for both. As spatial HBMs removed residual SAC, these models showed higher accuracy than nonspatial HBMs and handled the spatial effect on model outcomes. The ease of modelling and the consistency among model outputs for each genetic cluster was conditioned by the sparseness of the populations across the distribution range. Our HBMs enrich the toolbox of software available to evaluate GCC-induced distribution range shifts by considering both genetic heterogeneity and SAC, two inherent properties of any organism that should not be overlooked

    Climatic distribution of citrus black spot caused by 'Phyllosticta citricarpa'. A historical analysis of disease spread in South Africa

    Get PDF
    Citrus black spot (CBS), caused by Phyllosticta citricarpa, is one of the main fungal diseases of citrus worldwide. The Mediterranean Basin is free of the disease and thus phytosanitary measures are in place to avoid the entry of P. citricarpa in the EU territory. However, the suitability of the climates present in the Mediterranean Basin for CBS establishment and spread is debated. As a case study, an analysis of climate types and environmental variables in South Africa was conducted to identify potential associations with CBS distribution. The spread of the disease was traced and georeferenced datasets of CBS distribution and environmental variables were assembled. In 1950 CBS was still confined to areas of temperate climates with summer rainfall (Cw, Cf), but spread afterwards to neighbouring regions with markedly drier conditions. Actually, the hot arid steppe (BSh) is the predominant climate where CBS develops in South Africa nowadays. The disease was not detected in the Mediterranean-type climates Csa and Csb as defined by the Koppen-Geiger system and the more restrictive Aschmann's classification criteria. However, arid steppe (BS) climates, where CBS is prevalent in South Africa, are common in important citrus areas in the Mediterranean Basin. The most noticeable change in the environmental range occupied by CBS in South Africa was the amount and seasonality of rainfall. Due to the spread of the disease to dryer regions, the minimum annual precipitation in CBS-affected areas declined from 663 mm in 1950 to 339 mm at present. The minimum value precipitation of warmest quarter also declined from 290 to 96 mm. Strong spatial autocorrelation in CBS distribution data was detected, so further modelling efforts should consider the relative contribution of environmental variables and spatial effects to estimate the potential geographical range of CBS

    A hierarchical Bayesian Beta regression approach to study the effects of geographical genetic structure and spatial autocorrelation on species distribution range shifts

    Get PDF
    Global climate change (GCC) may be causing distribution range shifts in many organisms worldwide. Multiple efforts are currently focused on the development of models to better predict distribution range shifts due to GCC. We addressed this issue by including intraspecific genetic structure and spatial autocorrelation (SAC) of data in distribution range models. Both factors reflect the joint effect of ecoevolutionary processes on the geographical heterogeneity of populations. We used a collection of 301 georeferenced accessions of the annual plant Arabidopsis thaliana in its Iberian Peninsula range, where the species shows strong geographical genetic structure. We developed spatial and nonspatial hierarchical Bayesian models (HBMs) to depict current and future distribution ranges for the four genetic clusters detected. We also compared the performance of HBMs with Maxent (a presence-only model). Maxent and nonspatial HBMs presented some shortcomings, such as the loss of accessions with high genetic admixture in the case of Maxent and the presence of residual SAC for both. As spatial HBMs removed residual SAC, these models showed higher accuracy than nonspatial HBMs and handled the spatial effect on model outcomes. The ease of modelling and the consistency among model outputs for each genetic cluster was conditioned by the sparseness of the populations across the distribution range. Our HBMs enrich the toolbox of software available to evaluate GCC-induced distribution range shifts by considering both genetic heterogeneity and SAC, two inherent properties of any organism that should not be overlooked
    corecore